Recursive Adaptation of Stepsize Parameter for Non-stationary Environments

نویسنده

  • Itsuki Noda
چکیده

In this article, we propose a method to adapt stepsize parameters used in reinforcement learning for dynamic environments. In general reinforcement learning situations, a stepsize parameter is decreased to zero during learning, because the environment is generally supposed to be noisy but stationary, such that the true expected rewards are fixed. On the other hand, we assume that in the real world, the true expected reward changes over time and hence, the learning agent must adapt the change through continuous learning. We derive the higher-order derivatives of exponential moving average (which is used to estimate the expected values of states or actions in major reinforcement learning) using stepsize parameters. We also illustrate a mechanism to calculate these derivatives in a recursive manner. Using the mechanism, we construct a precise and flexible adaptation method for the stepsize parameter in order to minimize square errors or maximize a certain criterion. The proposed method is validated both theoretically and experimentally.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

مکان یابی وفقی موبایل به روش آزمون باقی‌مانده

Determination of mobile localization with time of arrival (TOA) signal is a requirement in cellular mobile communication. In some of the previous methods, localization with non-line-of-sight (NLOS) paths can lead to large position error. Also for simplicity, in most simulations suppose non stationary actual environments as stationary. This paper proposes (residual test + recursive least square)...

متن کامل

Steady-state Performance Analysis of Bayesian Adaptive Filtering

Adaptive filtering is in principle intended for tracking nonstationary systems. However, most adaptive filtering algorithms have been designed for converging to a fixed unknown filter. When actually confronted with a non-stationary environment, they possess just one parameter (stepsize, window size) to adjust their tracking capability. In the stationary case of non-stationarity, the optimal fil...

متن کامل

The Time Adaptive Self Organizing Map for Distribution Estimation

The feature map represented by the set of weight vectors of the basic SOM (Self-Organizing Map) provides a good approximation to the input space from which the sample vectors come. But the timedecreasing learning rate and neighborhood function of the basic SOM algorithm reduce its capability to adapt weights for a varied environment. In dealing with non-stationary input distributions and changi...

متن کامل

Stochastic Recursive Algorithms for Networked Systems with Delay and Random Switching: Multiscale Formulations and Asymptotic Properties

Motivated by consensus control of networked systems with communication latency and randomly switching topologies, this paper studies stochastic approximation (SA) algorithms for systems with time delays and randomly switching dynamics. To accommodate realistic time delay systems, our formulation of the discrete-time systems does not impose bounds on delays when the sampling intervals become sma...

متن کامل

Performance Analysis of Bayesian Adaptive Filtering

While adaptive filtering is in principle intended for tracking non-stationary systems, most adaptive filtering algorithms have been designed for converging to a fixed unknown filter. When actually confronted with a non-stationary environment, they possess just one parameter (stepsize, forgetting factor) to adjust their tracking capability. Virtually the only existing optimal approach is the Kal...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009